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Burkholderia sp. strain WSM2232 is an aerobic, motile, Gram-negative, non-spore-forming 
acid-tolerant rod that was trapped in 2001 from acidic soil collected from Karijini National 
Park (Australia) using Castrolobium capitatum as a host. WSM2232 was effective in nitrogen 
fixation with G. capitatum but subsequently lost symbiotic competence during long-term 
storage. Here we describe the features of Burkholderia sp. strain WSM2232, together with 
genome sequence information and its annotation. The 7,208,31 1 bp standard-draft genome is 
arranged into 72 scaffolds of 72 contigs containing 6,322 protein-coding genes and 61 RNA- 
only encoding genes. The loss of symbiotic capability can now be attributed to the loss of 
nodulation and nitrogen fixation genes from the genome. This rhizobial genome is one of 100 
sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacte- 
ria and /\/r/?aea-Root Nodule Bacteria (GEBA-RNB) project. 



Introduction 

Burkholderia spp. are a diverse group of organ- 
isms capable of thriving in diverse environments 
with many forming mutualistic associations with 
organisms such as fungi and plants [1]. The devel- 
opment in the 1960s and 1970s of a rational clas- 
sification system for Pseudomonas species result- 
ed in proposals to give different generic names to 
taxonomically distinct groups. The organisms pre- 
viously classified within Pseudomonas rRNA simi- 
larity Group II were transferred into the new ge- 
nus Burkholderia [2]. All described Burkholderia 
species at that time were phytopathogenic, or op- 
portunistic mammalian pathogens with the type 
species B. cepacia becoming a growing community 
health concern in immunocompromised and cystic 
fibrosis patients [3-5]. With the isolation of more 
Burkholderia spp., it has become apparent that the 
genus is a far more complex mix, with the isolation 




of numerous soil-inhabiting species capable of de- 
grading heavy metals and environmental contam- 
inants [6,7]. Further reports identified plant 
growth promoting (PGP) species and legume 
microsymbionts. This led to a paradigm shift in 
rhizobiology and resulted in numerous new novel 
Burkholderia spp. descriptions [8-10]. 

Most PGP, or legume microsymbiont species of 
Burkholderia have been isolated in South America 
from Mimosa spp. or South Africa from 
Papilionoideae legumes and until recently, B. 
graminis was the only described PGP bacterial 
species isolated from Australia in the maize 
rhizosphere [11]. Australian Burkholderia have 
been isolated as nodule occupants from some Aca- 
cia spp., [12] however none have been authenti- 
cated or tested for the nodulation of other leg- 
umes. There is little data regarding the symbiosis 
between Burkholderia and legumes in Australia 

The Genomic Standards Consortium 



Walker et al. 



compared to South Africa and South America. 
Burkholderia sp. WSM2232 was trapped from 
acidic soil (pHCaC12 4.8) collected from Karijini 
National Park (Western Australia) using 
Gastrolobium capitatum as a host. Sites where the 
soil pH was higher (pHc a ci2 >7) did not contain any 
Burkholderia symbionts but did contain numerous 
Bradyrhizobium and Rhizobium spp. (Watkin, un- 
published). Soil pH is an edaphic variable that con- 
trols microbial biogeography [13] and the acid 
tolerance of Burkholderia has been shown to ac- 
count for the biogeographical distribution of this 
genus [14]. 

The symbiotic capacity of WSM2232 was authen- 
ticated in axenic glasshouse trials using inocula- 
tion of G. capitatum grown in nitrogen free condi- 
tions. Inoculated plants nodulated by WSM2232 
produced significantly greater mass than 
uninoculated controls. WSM2232 was subcultured 
and placed in long-term storage in frozen labora- 
tory glycerol stocks. Isolate revival and inocula- 
tion onto endemic Australian legumes failed to 
elicit a symbiotic response. The reason for the loss 
of the symbiotic phenotype has, until now, not 
been identified. 

The genome of Burkholderia strain WSM2232 is 
one of two Australian Burkholderia genomes (the 
other being that of WSM2230 (GOLD ID Gi08831)) 
that have now been sequenced through the Ge- 
nomic Encyclopedia for Bacteria and Archaea- 
Root Nodule Bacteria (GEBA-RNB) program. Here 
we present a preliminary description of the gen- 
eral features of Burkholderia sp. WSM2232 to- 
gether with its genome sequence and annotation. 
The absence of nodulation genes within this ge- 
nome explains the nodulation minus symbiotic 
phenotype of the laboratory cultured strain. The 
genomes of WSM2232 and WSM2230 will be an 



important resource to identify the processes ena- 
bling such isolates to adapt to the infertile, highly 
acidic soils that dominate the Australian land- 
scape. 

Classification and features 

Burkholderia sp. strain WSM2232 is a motile, non- 
sporulating, non-encapsulated, Gram-negative rod 
in the order Burkholderiales of the class 
Betaproteobacteria. The rod-shaped form varies in 
size with dimensions of 0.25-0.5 |im for width and 
0.5-2.0 |im for length (Figure 1A and IB). 

It is fast growing, forming colonies within 1-2 days 
when grown on LB agar [15] devoid of NaCl and 
within 3-4 days when grown on half strength Lu- 
pin Agar (%LA) [16], tryptone-yeast extract agar 
(TY) [17] or a modified yeast-mannitol agar (YMA) 
[18] at 28°C. Colonies on V2LA are opaque, slightly 
domed and moderately mucoid with smooth mar- 
gins. 

Burkholderia sp. WSM2232 falls into a large clade 
containing PGP, bioremediation and legume 
microsymbiont species, and WSM2232 demon- 
strates PGP phenotypes including phosphate 
solubilization and hydroxamate-like siderophore 
production and is acid tolerant with growth in the 
pH range of 4.5-9.0 (Walker, unpublished). 

Minimum Information about the Genome Se- 
quence (MIGS) is provided in Table 1. Figure 2 
shows the phylogenetic neighborhood of 
Burkholderia sp. strain WSM2232 in a 16S rRNA 
sequence based tree. This strain shares 99% 
(1352/1364 bp) sequence identity to the 16S 
rRNA gene of the sequenced strain Burkholderia 
sp. WSM2230 (Gi08831). 




Figure 1. Images of Burkholderia sp. strain WSM2232 using scanning (A) and 
transmission (B) electron microscopy. 



http://standardsingenomics.org 



1169 



Burkholderia sp. strain WSM2232 



Table 1. Classification and general features of Burkholderia sp. strain WSM2232 according to the 
MIGS recommendations [19]. 



MIGS ID 


Property 

1 L 


Term 


Evidence code 3 






Domain Bacteria 


TAS [20] 






Phylum Proteobacteria 


TAS [21] 






Class Betaproteobacteria 


TAS [22,23] 




Current classification 


Order Burkholderiales 


TAS [23,24] 






Family Burkholderiaceae 


TAS [23,25] 






Genus Burkholderia 


TAS [2,26,27] 






Species Burkholderia sp. 


IDA 






Strain WSM2232 


IDA 




Gram stain 


Negative 


IDA 




Cell shape 


Rod 


IDA 




Motility 


Motile 


IDA 




Sporulation 


Non-sporulating 


NAS 




Temperature range 


Mesophile 


IDA 




Optimum temperature 


30°C 


IDA 




Salinity 


Non-halophile 


IDA 


MIGS-22 


Oxygen requirement 


Aerobic 


IDA 




Carbon source 


Varied 


IDA 




Energy source 


Chemoorganotroph 


NAS 


MIGS-6 


Habitat 


Soil, root nodule, on host 


IDA 


MIGS-15 


Biotic relationship 


Free living, symbiotic 


IDA 


MIGS-14 


Pathogenicity 


Non-pathogenic 


IDA 




• 

Biosafety level 


i 
I 


T A C 

1 Ad 




Isolation 


Root nodule of Castrolobium capitatum 


IDA 


MIGS-4 


Geographic location 


Karijini National Park, Australia 


IDA 


MIGS-5 


Soil collection date 


September, 2001 


IDA 


MIGS-4.1 


Latitude 


117.99 


IDA 


MIGS-4.2 


Longitude 


-22.45 


IDA 


MIGS-4.3 


Depth 


0-10 cm 


IDA 


MIGS-4.4 


Altitude 


Not recorded 


IDA 



a Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct re- 
port exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the liv- 
ing, isolated sample, but based on a generally accepted property for the species, or anecdotal evi- 
dence). These evidence codes are from the Gene Ontology project [28]. 
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85 



55 | 



89 



94 



64 



Burkholderia sp. WSM2230 (HQ677909, Gi08831) 
- Burkholderia graminis C4D1M T (U96939) 



95 



1 — Burkholderia sp. WSM2232 (HQ677910, Gi08832) 

i — Burkholderia rhynchosiae WSM3937 7 (EU219865, Gi08878) 

95~l Burkholderia sp. WSM3556 (HQ698908, Gi08872) 

j Burkholderia sp. UYPR1.413 (JF683693, Gi08829) 

1 Burkholderia sabiae Br3407 T (AY773186) 

Burkholderia caribiensis MWAP64 T (Y17009) 



54 



Burkholderia phymatum STM815 T (AJ302312) 

99 r Burkholderia mimosarum LMG 23256 T (AB537489, Gi08823) 

89 



45 



Burkholderia mimosarum STM3621 (FN908408, Gi08839) 
— Burkholderia nodosa Br3437 T (AY773189) 



97 



Burkholderia sprentiae WSM5005 1 " (HF549035) 



69 



— Burkholderia sp. JPY251 (FN543663, Gi08874) 
■ Burkholderia sp. mcas7.1 (AY528706, Gi08846) 

| Burkholderia sp. WSM4176 (HQ698909, Gi08873) 

ssl Burkholderia tuberum STM678 T (AJ30231 1) 
- Burkholderia sp. JPY366 (HM357231, Gi08876) 
Burkholderia sp. JPY347 (FN543709, Gi08875) 
Burkholderia symbiotica NKMU-JPY345 1 " (HM357233) 
• Burkholderia cepacia ATCC25416 7 (AF097530) 

93 r Cupriavidus taiwanensis LMG 19424 7 (AF300324, Gc00754)* 
Cupriavidus taiwanensis STM6018 (FN908410, Gi08840) 
Cupriavidus sp. amp6 (DQ530646, Gi08845) 



100 



100 



39 



r- Cupriavidus taiwanensis STM6070 (FN908230, Gi08841) 
sal Cupriavidus sp. UYPR2.512 (JF683703, Gi08830) 



0.01 



Figure 2. Phylogenetic tree showing the relationship of Burkholderia sp. strain WSM2232 (shown in bold print) to 
other members of the order Burkholderiales based on aligned sequences of the 1 6S rRNA gene (1 ,242 bp internal 
region). All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed 
using MEGA [29], version 5. The tree was built using the Maximum-Likelihood method with the General Time 
Reversible model [30]. Bootstrap analysis [31] with 500 replicates was performed to assess the support for the 
clusters. Type strains are indicated with a superscript T. Brackets after the strain name contain a DNA database 
accession number and/or a GOLD ID (beginning with the prefix G) for a sequencing project registered in GOLD 
[32]. Published genomes are indicated with an asterisk. 



Symbiotaxonomy 

Burkholderia sp. WSM2232 formed nodules 
(Nod+) and fixed N2 (Fix+) with G. capitatum 
when first isolated and was Nod- on various other 
Australian legumes and Mimosa pudica (Table 2). 



However, after long-term storage and subsequent 
culture, it failed to effectively nodulate G. 
capitatum. 
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Table 2. Compatibility of Burkholderia sp. WSM2232 with nine legume species for nodu- 



lation (Nod) and N 2 -Fixation (Fix). 


Species Name 


Common Name 


Growth Type 


Nod Fix Reference 


Castrolobium capitatum 3 


Bitter Pea 


Perennial 


+ + IDA C 


Castrolobium capitaturri 0 


Bitter Pea 


Perennial 


IDA 


Kennedia coccinea 


Coral Vine 


Perennial 


IDA 


Swainsona formosa 


Sturts Desert Pea 


Annual 


IDA 


Indigofera trita 




Annual 


IDA 


Oxylobium robustum 


Shaggy Pea 


Perennial 


IDA 


Acacia acuminata 


Jam Wattle 


Perennial 


IDA 


Acacia paraneura 


Weeping Mulga 


Perennial 


IDA 


Acacia stenophylla 




Perennial 


IDA 


Mimosa pudica 


Sensitive Plant 


Perennial 


IDA 



a result obtained from trapping experiment. Authentication result following long-term 
storage. Evidence codes - IDA: Inferred from Direct Assay from http://www.gene- 
ontology.org/GO.evidence.shtml of the Gene Ontology project [28]. 



Phenotype Microarray 

Strain WSM2232 was assayed using the Biolog 
Phenotype Microarray® plates (PM1 to 3) system 
testing 190 carbon and 95 nitrogen compounds. 
Plates were purchased from Biolog and tests were 



carried out per manufacturer's instructions. The 
irreversible reduction of tetrazolium dye to 
formazan is used in this system to report on active 
metabolism [33]. The results obtained from the 
colorimetric assay are shown in Table 3. 



Table 3. Reduction of tetrazolium dye by NADH produced by respiring cells of Burkholderia sp. 
WSM2232 in the Biolog Phenotype Microarray. 



PM1 plate 




PM2 plate 




PM3 plate 




Compound 




Compound 




Compound 




L-Arabinose 


+ 


Chondroitin Sulfate C 




Ammonia 


+ 


N-Acetyl-D Glucosamine 


+ 


a-Cyclodextrin 




Nitrite 


+ 


D-Saccharic Acid 


+ 


(3-Cyclodextrin 




Nitrate 


+ 


Succinic Acid 


+ 


y-Cyclodextrin 




Urea 


+ 


D-Galactose 


+ 


Dextrin 


+ 


Biuret 




L-Aspartic Acid 


+ 


Gelatin 




L-Alanine 


+ 


L-Proline 


+ 


Glycogen 




L-Arginine 


+ 


D-Alanine 


+ 


Inulin 




L-Asparagine 


+ 


D-Trehalose 


+ 


Laminarin 




L-Aspartic Acid 


+ 


D-Mannose 


+ 


Mannan 




L-Cysteine 


+ 


Dulcitol 


+ 


Pectin 




L-Glutamic Acid 


+ 


D-Serine 




N-Acetyl-D- 
Galactosamine 


+ 


L-Glutamine 


+ 


D-Sorbitol 


+ 


N-Acetyl-Neuraminic Acid 




Glycine 


+ 


Glycerol 


+ 


P-D-Allose 




L-Histidine 


+ 


L-Fucose 


+ 


Amygdalin 




L-lsoleucine 


+ 


D-Glucuronic Acid 


+ 


D-Arabinose 


+ 


L-Leucine 


+ 


D-Gluconic Acid 


+ 


D-Arabitol 


+ 


L-Lysine 


+ 


D,L-a-Glycerol-Phosphate 


+ 


L-Arabitol 


+ 


L-Methionine 


+ 


D-Xylose 


+ 


Arbutin 




L-Phenylalanine 


+ 


L-Lactic Acid 


+ 


2-Deoxy-D-Ribose 


+ 


L-Proline 


+ 
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PM1 plate 




PM2 plate 




PM3 plate 




Compound 




Compound 




Compound 




Formic Acid 


+ 


l-Erythritol 


- 


L-Serine 


+ 


D-Mannitol 


+ 


D-Fucose 


+ 


L-Threonine 


+ 


L-Glutamic Acid 


+ 


3-0-P-D-Galacto- 
pyranosyl-DArabinose 




L-Tryptophan 


+ 


D-Glucose-6-Phosphate 


+ 


Gentiobiose 


- 


L-Tyrosine 


+ 


D-Galactonic Acid-y- 




1 -f~i h iroc;p 




1 -\/a 1 i np 




Lactone 


-j- 




-j- 


D,L-Malic Acid 


+ 


Lactitol 


- 


D-Alanine 


+ 


D-Ribose 


+ 


D-Melezitose 


- 


D-Asparagine 


+ 


Tween 20 


+ 


Maltitol 




D-Aspartic Acid 


+ 


L-Rhamnose 


+ 


a-Methyl-D-Glucoside 


; 


D-Glutamic Acid 


+ 


D-Fructose 


+ 


B-Methyl-D-Galactoside 


+ 


D-Lysine 


+ 


Acetic Acid 


+ 


3-Methyl Glucose 


- 


D-Serine 


+ 


IX VJ VJ 1 U C O b t: 


l 


P-Methyl-D-Glucuronic 
Acid 




D \/alrnp> 

L/ V dl II IC 


l 


Maltose 




a-Methyl-D-Mannoside 




L-Citrulline 


+ 


D-Melibiose 


- 


P-Methyl-D-Xyloside 


- 


L-Homoserine 


+ 


Thymidine 




Palatinose 




L-Ornithine 


+ 


L-Asparagine 


+ 


D-Raffinose 


- 


N-Acetyl-D,L-Glutamic 
Acid 


+ 


D-Aspartic Acid 




Salicin 




N-Phthaloyl-L-Glutamic 
Acid 




D-Glucosaminic Acid 


+ 


Sedoheptulosan 


- 


L-Pyroglutamic Acid 


+ 


1 ,2-Propanediol 


- 


L-Sorbose 


- 


Hydroxylamine 


+ 


Tween 40 


+ 


Stachyose 




Methylamine 


+ 


a-Keto-Glutaric Acid 


+ 


D-Tagatose 


+ 


N-Amylamine 


+ 


a-Keto-Butyric Acid 


+ 


Turanose 


+ 


N-Butylamine 


+ 


a-Methyl-D-Galactoside 


- 


Xylitol 


+ 


Ethylamine 


- 


ry_r)_| qrtose 




N -Acetyl -D- 
Glucosaminitol 




Fthanolamine 

L I 1 J CI 1 1 v_/ ICilJ III J v_ 


-f- 


Lactulose 


+ 


y-Amino Butyric Acid 


+ 


Ethylenediamine 


- 


Sucrose 


- 


5-Amino Valeric Acid 


+ 


Putrescine 


+ 


Uridine 


+ 


Butyric Acid 


+ 


Agmatine 


- 


L-Glutamine 


+ 


Capric Acid 


- 


Histamine 


- 


M-Tartaric Acid 


+ 


Caproic Acid 


+ 


P-Phenylethylamine 


+ 


D-Glucose-1 -Phosphate 


+ 


Citraconic Acid 


+ 


Tyramine 




D-Fructose-6-Phosphate 


+ 


Citramalic Acid 


+ 


Acetamide 


+ 


Tween 80 


+ 


D-Glucosamine 


+ 


Formamide 


+ 


a-Hydroxy Glutaric Acid- 
y-Lactone 




2-Hydroxy Benzoic Acid 




G lucuron amide 


+ 


a-Hydroxy Butyric Acid 


+ 


4-Hydroxy Benzoic Acid 


+ 


D,L-Lactamide 


+ 


P-Methyl-D-Glucoside 




P-Hydroxy Butyric Acid 


+ 


D-Glucosamine 


+ 


Adonitol 


+ 


y-Hydroxy Butyric Acid 


+ 


DGalactosamine 


+ 


Maltotriose 


- 


a-Keto Valeric Acid 


- 


DMannosamine 


+ 


2-Deoxy Adenosine 




Itaconic Acid 




N-Acetyl-D-Glucosamine 


+ 


Adenosine 


+ 


5-Keto-D-Gluconic Acid 




N-Acetyl-D- 
Galactosamine 




Glycy-L-Aspartic Acid 


+ 


D-Lactic Acid Methyl Ester 


+ 


N-Acetyl-D-Mannosamine 




Citric Acid 


+ 


Malonic Acid 


+ 


Adenine 


+ 
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PM1 plate 
Compound 




PM2 plate 
Compound 




PM3 plate 
Compound 




M-lnositol 


+ 


Melibionic Acid 


+ 


Adenosine 


+ 


D-Threonine 


- 


Oxalic Acid 


+ 


Cytidine 


+ 


Fumaric Acid 


+ 


Oxalomalic Acid 


+ 


Cytosine 


+ 


Bromo Succinic Acid 


+ 


Quinic Acid 


+ 


Guanine 


- 


Propionic Acid 


+ 


D-Ribono-1 ,4-Lactone 


- 


Guanosine 


+ 


Mucic Acid 


+ 


Sebacic Acid 


+ 


Thymine 


+ 


Glycolic Acid 


- 


Sorbic Acid 


+ 


Thymidine 


- 


Glyoxylic Acid 


+ 


Succinamic Acid 


+ 


Uracil 


+ 


D-Cellobiose 


- 


D-Tartaric Acid 


+ 


Uridine 


+ 


Inosine 


+ 


L-Tartari c Acid 


+ 


Inosine 


+ 


Glycyl-L-Glutamic Acid 


+ 


Acetamide 


- 


Xanthine 


+ 


Tricarballylic Acid 


+ 


L-Alaninamide 


+ 


Xanthosine 


+ 


L-Serine 


+ 


N-Acetyl-L-Glutamic Acid 


+ 


Uric Acid 


+ 


L-Threonine 


+ 


L-Arginine 


+ 


Alloxan 


+ 


L-Alanine 


+ 


Glycine 




Allantoin 


+ 


L-Allnyl-Glycine 


+ 


L-Histidine 


+ 


Parabanic Acid 


+ 


Acetoacetic Acid 


+ 


L-Homoserine 


+ 


D,L-a-Amino-N-Butyric 
Acid 


+ 



N- Acetyl -P-D- 
Mannosamine 

Mono Methyl Succinate + 

Methyl Pyruvate + 

D-Malic Acid + 
L-Malic Acid + 
Glycyl-L-Proline + 
p-Hydroxy Phenyl Acetic 
Acid 

m-Hydroxy Phenyl Acetic 
Acid 

Tyramine 
D-Psicose 

L-Lyxose + 

Glucuronamide 

Pyruvic Acid + 

L-Galactonic Acid-y- 

Lactone 

D-Galacturonic Acid + 
Phenylethylamine + 
2-Aminoethanol + 



Hydroxy-L-Proline 

L-lsoleucine 

L-Leucine 

L-Lysine 

L-Methionine 

L-Ornithine 

L-Phenylalanine 

L-Pyroglutamic Acid 

L-Valine 

D,L-Carnitine 

Sec-Butylamine 

D,L-Octopamine 

Putrescine 

Dihydroxy Acetone 

2,3-Butanediol 
2,3-Butanone 
3-Hydrox y-2-Butanone 



+ 
+ 
+ 
+ 

+ 
+ 



+ 
+ 



+ 
+ 



y-Amino-N-Butyric Acid + 

E-Amino-N-Caproic Acid 
D,L-a-Amino-Caprylic Ac- 
id 

5-Amino-N-Valeric Acid + 

a-Amino-N-Valeric Acid + 

Ala-Asp + 

Ala-Gin + 

Ala-Glu + 

Ala-Gly + 

Ala-His + 

Ala-Leu + 

Ala-Thr + 

Gly-Asn + 

Gly-GIn + 

Gly-Glu + 

Gly-Met + 

Met- Ala + 



Genome sequencing and annotation 

Genome project history 

This organism was selected for sequencing on the 
basis of its environmental and agricultural rele- 
vance to issues in global carbon cycling, alterna- 
tive energy production, and biogeochemical im- 
portance, and is part of the Community Sequenc- 



ing Program at the U.S. Department of Energy, 
Joint Genome Institute (JGI) for projects of rele- 
vance to agency missions. The genome project is 
deposited in the Genomes OnLine Database [32] 
and a standard-draft genome sequence in IMG. Se- 
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quencing, finishing and annotation were per- 
formed by the JGI. A summary of the project in- 
formation is shown in Table 4. 

Growth conditions and DNA isolation 

Burkholderia sp. strain WSM2232 was cultured to 
mid logarithmic phase in 60 ml of TY rich medium 
on a gyratory shaker at 28°C [34]. DNA was isolat- 
ed from the cells using a CTAB (Cetyl trimethyl 
ammonium bromide) bacterial genomic DNA iso- 
lation method ( http : / / my.jgi.doe.gov/ general- 
/index.html ). 

Genome sequencing and assembly 

The genome of Burkholderia sp. strain WSM2232 
was sequenced at the Joint Genome Institute (JGI) 
using Illumina technology [35]. An Illumina stand- 
ard shotgun library was constructed and se- 
quenced using the Illumina HiSeq 2000 platform, 
which generated 12,244,888, reads totaling 1,837 
Mbp. 

All general aspects of library construction and se- 
quencing performed at the JGI can be found at 
http://my.jgi.doe.gov/general/index.html. All raw 
Illumina sequence data was passed through DUK, 
a filtering program developed at JGI, which re- 
moves known Illumina sequencing and library 
preparation artifacts (Mingkun, L., Copeland, A. 
and Han, J., unpublished). The following steps 
were then performed for assembly: 

(1) Filtered Illumina reads were assembled using 
Velvet [36] (version 1.1.04) 

(2) 1-3 Kbp simulated paired end reads were cre- 
ated from Velvet contigs using wgsim 
f https:/ / github.com/lh3/wgsim ) 

(3) Illumina reads were assembled with simulated 
read pairs using Allpaths-LG [37] (version 
r37348). 

Parameters for assembly steps were: 

1) Velvet ~v -s 51 -e 71 ~i 2 ~t 1 ~f "- 
shortPaired -fastq $FASTQ" ~o "-insjength 250 - 
min_contig_lgth 500") 
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2) wgsim (-e 0 -1 76 -2 76 -r 0 -R 0 -X 0) 

3) Allpaths-LG (STD_1, project, assembly, frag- 
ment, l,200,35„,inward,0,0 SIMREADS, pro- 
ject,assembly,jumping,l„,3000,300,inward,0,0). 

The final draft assembly contained 72 contigs in 
72 scaffolds. The total size of the genome is 7.2 
Mbp and the final assembly is based on 1,837 Mbp 
of Illumina data, which provides an average 255* 
coverage of the genome. 

Genome annotation 

Genes were identified using Prodigal [38] as part 
of the D0E-JGI annotation pipeline [39], followed 
by a round of manual curation using the JGI 
GenePrimp pipeline [40]. The predicted CDSs 
were translated and used to search the National 
Center for Biotechnology Information (NCBI) 
nonredundant database, UniProt, TIGRFam, Pfam, 
PRIAM, KEGG, COG, and InterPro databases. The 
tRNAScanSE tool [41] was used to find tRNA 
genes, whereas ribosomal RNA genes were found 
by searches against models of the ribosomal RNA 
genes built from SILVA [42]. Other non-coding 
RNAs such as the RNA components of the protein 
secretion complex and the RNase P were identi- 
fied by searching the genome for the correspond- 
ing Rfam profiles using INFERNAL 
(http://infernal.janelia.org). Additional gene pre- 
diction analysis and manual functional annotation 
was performed within the Integrated Microbial 
Genomes (IMG-ER) platform [43]. 

Genome properties 

The genome is 7,208,311 nucleotides 63.11% GC 
content (Table 5) and comprised of 72 scaffolds 
(Figure 3) of 72 contigs. From a total of 6,383 
genes, 6,322 were protein encoding and 61 RNA 
only encoding genes. The majority of genes 
(80.90%) were assigned a putative function whilst 
the remaining genes were annotated as hypothet- 
ical. The distribution of genes into COGs functional 
categories is presented in Table 6. 
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Table 4. Genome sequencing project information for Burkholderia sp. WSM2232. 



MIGS ID 


Property 


Term 


MIGS-31 


Finishing quality 


Standard draft 


MIGS-28 


Libraries used 


One lllumina fragment library 


MIGS-29 


Sequencing platforms 


lllumina HiSeq 2000 


MIGS-31. 2 


Sequencing coverage 


lllumina: 255x 


MIGS-30 


Assemblers 


Velvet version 1.1 .04; Allpaths-LG version r37348 


MIGS-32 


Gene calling methods 


Prodigal 1 .4 




GOLD ID 


Gi08832 a 




NCBI project ID 


182741 




Database: IMG 


2508501 125 b 




Project relevance 


Symbiotic N 2 fixation, agriculture 
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Figure 3. Graphical map of the four largest scaffolds genome for the genome of Burkholderia sp. strain 
WSM2232. From bottom to the top of each scaffold: Genes on forward strand (color by COG categories as 
denoted by the IMG platform), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, 
sRNAs red, other RNAs black), GC content, GC skew. 
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Table 5. Genome Statistics tor Burknolderia sp. 


strain WSM2232. 




Attribute 


Value 


% of total 3 


Genome size (bp) 


7,208,311 


100.00 


DNA coding region (bp) 


6,203,174 


86.06 


DNAG+C content (bp) 


4,548,885 


63.11 


Number of scaffolds 


72 




Number of contigs 


72 




Total gene 


6,383 


100.00 


RNA genes 


61 


0.96 


rRNA operons b 


1 


0.02 


Protein-coding genes 


6,322 


99.04 


Genes with function prediction 


5,164 


80.90 


Genes assigned to COGs 


5,151 


80.70 


Genes assigned Pfam domains 


5,425 


84.99 


Genes with signal peptides 


645 


10.10 


Genes with transmembrane helices 


1,497 


23.45 


CRISPR repeats 


1 




"Total is based on either the size of the genome 


in base pairs or the total 


number of 


protein coding genes in the annotated genome. 


b 4 copies of 5S, 2 copies of 1 6S and 1 



copy of 23S rRNA. 



Table 6. Number of protein coding genes of Burkholderia sp. strain WSM2232 associated 
with the general COG functional categories. 



Code 


Value 


%age a 


Description 


J 


474 


8.15 


Carbohydrate transport and metabolism 


A 


3 


0.05 


RNA processing and modification 


K 


151 


2.60 


Replication, recombination and repair 


L 


559 


9.61 


Transcription 


B 


1 


0.0 


Chromatin structure and dynamics 


D 


42 


0.72 


Cell cycle control, cell division and chromosome partioning 


Y 


0 


0.0 


Nuclear structure 


V 


0 


0.0 


Defense mechcanism 


T 


318 


5.47 


Signal transduction mechanisms 


M 


371 


6.38 


Cell wall/membrane/envelope biogenesis 


N 


125 


2.15 


Cell motility 


Z 


0 


0.00 


Cytoskeleton 


w 


2 


0.03 


Extracellular structures 


u 


154 


2.65 


Intracellular trafficking, secretion, and vesicular transport 


o 


183 


3.15 


Posttranslational modification, protein turnover, chaperones 


c 


384 


6.60 


Energy production conversion 


G 


194 


3.34 


Translation, ribosomal structure and biogenesis 


E 


569 


9.79 


Amino acid transport and metabolism 


F 


100 


1.72 


Nucleotide transport and metabolism 


H 


213 


3.66 


Coenzyme transport and metabolism 


1 


277 


4.76 


Lipid transport and metabolism 


P 


269 


4.63 


Inorganic ion transport and metabolism 


Q 


199 


3.42 


Secondary metabolite biosynthesis, transport and catabolism 


R 


673 


11.58 


General function prediction only 


S 


500 


8.60 


Function unknown 




1,232 


19.30 


Not in COGs 



a The total is based on the total number of protein coding genes in the annotated genome. 
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